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ABSTRACT  |  Vision  problems  are  inherently  ambiguous:  Do 
abrupt  brightness  changes  correspond  to  object  boundaries? 
Are  smooth  intensity  changes  due  to  shading  or  material 
properties?  For  stereo:  Which  point  in  the  left  image  corre¬ 
sponds  to  which  point  in  the  right  one?  What  is  the  role  of  color 
in  visual  information  processing?  To  answer  these  (seemingly 
different)  questions  we  develop  an  analogy  between  the  role  of 
orientation  in  organizing  visual  cortex  and  tangents  in  differ¬ 
ential  geometry.  Machine  learning  experiments  suggest  using 
geometry  as  a  surrogate  for  high-order  statistical  interactions. 
The  cortical  columnar  architecture  becomes  a  bundle  structure 
in  geometry.  Connection  forms  within  these  bundles  suggest 
answers  to  the  above  questions,  and  curvatures  emerge  in  key 
roles.  More  generally,  our  path  through  these  questions 
suggests  an  overall  strategy  for  solving  the  inverse  problems 
of  vision:  decompose  the  global  problems  into  networks  of 
smaller  ones  and  then  seek  constraints  from  these  coupled 
problems  to  reduce  ambiguity.  Neural  computations  thus 
amount  to  satisfying  constraints  rather  than  seeking  uniform 
approximations.  Even  when  no  global  formulation  exists  one 
may  be  able  to  find  localized  structures  on  which  ambiguity  is 
minimal;  these  can  then  anchor  an  overall  approximation. 
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I.  INTRODUCTION 

Cortex  consists  of  billions  of  neurons  and  trillions  of 
synapses,  all  in  support  of  various  neural  computations. 
Key  to  understanding  these  computations  is  building  a 
proper  abstraction.  While  one  routinely  thinks  of  neurons 
as  decision-making  units,  it  is  most  important  to  under¬ 
stand  which  questions  they  are  attempting  to  answer. 
Knowing  the  answers  could  suggest  insights  from  neuro¬ 
science  to  guide  engineering  theories  and  applications;  at 
the  same  time,  practical  considerations  can  provide  insight 
into  neural  computations. 

Our  focus  is  on  problems  of  early  and  intermediate- 
level  vision.  These  problems  are  difficult  for  applications 
(and  for  brains)  because  they  are  inverse  problems  [94]. 
Computer  graphics,  by  contrast,  is  a  forward  problem: 
shading  can  be  calculated  directly  given  models  of 
surfaces,  viewing  geometry,  and  lighting  [24].  Going  the 
other  way  there  are  (in  general)  many  different  surfaces 
and  lighting  combinations  that  could  account  for  a  given 
shading  distribution.  Structuring  these  inverse  choices  is 
what  makes  vision  an  inference  problem. 

Big  data  and  machine  learning  define,  to  some  extent, 
our  intellectual  environment.  It  is  already  the  case  that 
solutions  to  certain  classification  problems,  such  as 
reading  zip  codes,  can  be  learned  automatically  [70].  But 
how  far  can  one  go:  is  it  possible  to  learn  how  to  infer 
surfaces  from  shading  in  an  unconstrained,  unsupervised 
fashion?  We  maintain  that  there  are  deep  insights  into 
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Fig.  1.  Function  of  an  individual  neuron  (a)  in  visual  cortex  is  classically  summarized  by  its  receptive  field  (b).  Shown  is  a  Gabor  filter  tuned 
to  the  vertical  orientation,  (c)  Connections  between  such  neurons  define  networks  and  (d)  different  abstractions  of  these  networks  lead 
to  different  theoretical  ideas.  One  focus  of  this  paper  is  to  understand  an  abstraction  based  on  geometric  principles. 


these  problems  that  are  geometric  in  nature  and  that  could 
provide  novel  constraint.  And  as  we  will  show,  the 
geometry  is  also  reflected  in  neurobiology.  The  lesson,  in 
short,  is  that  geometry  serves  (at  least)  as  a  surrogate  for 
higher  order  statistical  analysis.  A  concrete  example  in 
edge  statistics  supports  this  claim,  and  a  surprising  result 
about  the  role  of  color  reinforces  its  usefulness. 


A.  From  Neural  Connections  to  Distributed  Models 

The  selectivity  of  individual  neurons  to  patterns  of  light 
has  strongly  influenced  ideas  about  neural  computation  in 
the  visual  system  (Fig.  1).  Receptive  fields,  or  the  pattern  of 
light  to  which  a  neuron  responds,  can  be  related  to  the 
statistics  of  natural  images  by  independent  component 
analysis  [7]  and  sparse  coding  [86].  At  a  larger  scale  there 
are  about  50  anatomically  distinct  visual  areas  [31],  each  of 
which  consists  of  elaborated  networks  of  neurons.  For 
nearly  every  feedforward  connection  from  neurons  in  one 
area  to  the  next,  there  is  a  feedback  projection  from  the 
higher  area. 

Since  receptive  fields  can  be  built  up  from  earlier 
projections,  they  have  been  taken  as  a  proxy  for  feedforward 
connections  between  neurons  in  different  areas.  Repeated 
across  several  “hidden  layers”  we  obtain  a  model  for  cortical 
architecture  (Fig.  2,  middle  row,  right).  Such  deep  network 
models  began  with  the  neocognitron  [34];  modern  exten¬ 
sions  [108]  have  different  nonlinearities  imposed  between 
the  feedforward  convolutions.  Passing  the  output  layer  into  a 


classifier  leads  to  recognition  systems  [30],  [70].  Popular 
algorithms  exist  for  both  supervised  and  unsupervised 
learning  of  network  parameters  [41]. 


Constraints 


Inference 

Engine 


Anatomy  & 
Physiology 


Fig.  2.  Levels  of  explanation  are  grounded  in  neurobiology 
and  include  both  the  inference  engine  and  the  constraints  on  which  it 
operates.  At  the  inference  engine  level  we  show  (right)  a  deep 
convolutional  network,  with  many  **hidden**  layers  that  is,  in  effect, 
equivalent  to  a  specialized  computation  on  (middle)  directed  acyclic 
graphs;  such  graphs  are  a  special  case  of  general  graphical  models 
(left).  At  the  constraint  level,  which  provides  the  **edges**  in  the 
graphical  models,  are  (left)  statistics  derived  from  the  world  and  those 
derived  from  models  (right).  We  will  concentrate  on  geometric  mitdeis 
in  this  paper. 
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Fig.  3.  From  image  statistics  to  constraints.  Edges  in  naturai  images  (a)  can  be  represented  as  points  in  (position,  orientation)  space  (b). 
The  joint  probabiiity  of  muiti pie  edges  co-occurring  in  a  iarge  corpus  of  image  patches  can  be  estimated  (c).  Since  this  probabiiity 
**matrix**  is  positive  semidefinite,  its  eigenvectors  can  provide  an  embedding  (d)  in  which  those  edge  tripies  iikeiy  to  co-occur 
appear  as  dusters  [see  (3)].  Mapping  these  dusters  back  to  a  position  representation  reveais  the  geometry  of  curves  (e).  Figure 
after  [69], 


But  there  is  much  more  to  cortical  anatomy.  There  are 
several  interconnected  pathways  in  the  ventral  stream 
implicated  in  object  representation  [61],  and  neurons 
within  each  of  the  areas  participate  in  elaborate  networks 
involving  both  short-range  and  long-range  connections. 
Fig.  2  (bottom)  shows  a  cartoon  elaboration  of  this  intraarea 
network,  the  output  of  which  projects  to  the  next  area.  The 
recurrent  backprojection  is  shown  arriving  in  the  superficial 
(top)  layers.  Deep  convolutional  networks  are  essentially 
directed  acyclic  graphs  [Fig.  2  (middle)];  more  realistic 
functionality  requires  a  graph  with  cycles  [Fig.  2  (middle, 
left)].  How  might  this  more  elaborate  function  be  described 
computationally?  Again,  there  are  many  possibilities. 

In  some  deep  networks,  the  feedforward  projections 
specify  activity,  and  the  feedback  modifies  synaptic  weights  by 
error  signal  backpropagation.  Richer  classes  of  graphical 
models  [60]  have  been  suggested  for  computational  reasons. 
Hierarchical  Bayesian  networks  [71]  postulate  inferences 
supported  by  a  combination  of  feedforward  observations  and 
feedback  priors.  For  a  problem  such  as  shape  from  shading,  for 
example,  feedforward  data  about  image  intensity  might  be 
interpreted  with  regard  to  feedback  involving  surface  and  light 
source  priors.  (We  discuss  this  further  in  Section  III-B.)  In 
computer  vision  terms,  such  inverse  problems  are  often  formu¬ 
lated  as  finding  a  (latent)  parameter  vector  that  best  describes 
given  (e.g.,  image)  data  according  to  a  model  [122].  The  model 


is  realized  as  an  energy  function,  and  the  model  parameters  are 
learned  from  training  data.  A  practical  consideration  is  that 
there  are  fast  algorithms  to  guide  the  search  for  interpretation 
parameters,  but  only  for  certain  graphs  [16],  [116]. 

Bayesian  networks  [14]  and  Markov  random  fields 
(MRFs)  are  related  realizations  [1],  [76],  [115].  A  popular 
form  resembles  statistical  mechanics  [44]  and  motivates  a 
connection  to  regularization  terms  in  MRFs  [95].  Boltzman 
machines  [42]  exploit  the  underlying  probability  distribu¬ 
tion  for  sampling. 

In  a  simple  sense,  neurons  can  be  viewed  as  decision 
makers,  firing  an  action  potential  when  they  receive  suffi¬ 
cient  support  (ionic  current)  from  other  neurons  projecting 
to  them.  Considering  the  set  of  “neurons”  as  nodes  in  a  graph, 
we  obtain  a  very  simple  form  for  such  networks.  Let  the  edges 
specify  which  neurons  are  connected  in  the  graph  and,  leaving 
technical  considerations  aside,  we  obtain  a  natural  quadratic 
“energy”  form  relevant  to  Hopfield  networks  and  (symmetric) 
relaxation  labeling  [50].  In  symbols,  if  pi  denotes  the 
probability  that  neuron  i  fires  and  Cj  j  denotes  the  synaptic 
coupling  from  neuron)  to  i,  then  summing  over  all  interacting 
neighbors  for  each  node  in  the  graph  yields 

Energy  =  y^PiCyP|.  (1) 

hi 
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Fig.  4.  Association  fieids  derive  from  pairwise  co-occurrence  statistics  and  iiiustrate  the  probabiiity  (iikeiihood)  of  a  particuiar  edge  near  a 
horizontai  edge  at  the  center  position.  Two  equaiiyiikeiy  pairs  of  edge  pairs  are  shown  in  (c)  and  (d);  higher  order  co-occurrence  probabiiities  are 
necessary  to  determine  which  of  these  is  more  iikeiy.  (a)  After  [4],  (b)  After  [35],  (c)  and  (d)  Data  from  [28], 


Constrained  gradient  ascent  provides  an  approach  to 
determine  which  neurons  should  be  active  to  maximize 
energy  from  an  initial  distribution  pj.  More  generally, 
when  neurons  are  viewed  as  coupled  decision  makers,  a 
more  subtle  connection  to  polymatrix  games  arises  [80], 
[81];  in  this  case,  the  optimal  payoff  is  given  by  the  Nash 
equilibrium,  and  the  constraints  no  longer  need  to  be 
symmetric.  For  general  MRFs,  the  constraints  Cij  are 
embedded  in  clique  potentials.  Constraints  are  thus  the 
“guts”  of  the  models:  there  are  several  different  types  of 
machines — inference  engines — ^within  which  to  use  them. 
The  question,  then,  is  how  to  find  these  constraints. 

B.  From  Image  Statistics  to  Abstract  Constraints 

Statistical  regularities  underlie  many  models  of 
machine  and  biological  learning.  For  example,  objects  in 
our  visual  world  are  coherent,  and  this  coherence  is 
reflected  in  the  probabilities  that  edge  elements  (or  image 
intensities  or  other  features)  co-occur  [111].  The  famous 
Hebb  synapse  [18]  is  often  summarized  by  the  phrase:  cells 
that  fire  together  wire  together.  Since  many  cells  respond  to 
edges,  it  is  natural  to  start  with  those  statistics  (Fig.  3). 

Let  Ei  denote  an  edge  at  position,  orientation 
rt  =  (xi,yi,  Oi).  Viewing  this  as  a  {0, 1} -valued  random 
variable  fj,  the  joint  distribution  is  well  studied 

[4],  [27],  [35],  [62].  It  is  convenient  to  view  this 
distribution  around  a  horizontal  edge  at  the  center  of  an 
image  patch  (Fig.  4).  Such  “association  field”  [32]  models 
of  continuation  are  prominent  in  psychophysical  research 
[26],  [39].  While  pairwise  information  is  useful,  higher 
order  structure  could  be  even  more  useful.  Thus  far,  such 
higher  order  information  has  been  developed  through 
models  tied  to  applications  [36],  [54].  As  we  now  show, 
following  [69],  it  is  possible  to  infer  higher  order  statistical 
information  directly. 

The  association  field  is  a  representation  of  pairwise 
information:  it  displays  roughly  the  probability  that  edge  Ej 
is  present  given  a  horizontal  edge  at  the  center.  Now 
consider  triples  of  edges.  These  could  derive  from  edge 


pairs  that  are  equally  likely  to  occur  but  not  likely  to  occur 
together  [Fig.  4(c)  and  (d)];  or  from  pairs  that  are  likely  to 
occur  together.  Statistically,  such  third-order  questions  are 
complex  to  answer  (but  see  [119]). 

Denote  positive  edge  triple  co-occurances  by  P{Ei  =  1, 
Ej  =  1,  Ek  =  l)  =  P(hj,  k).  This  matrix  can  be  estimated 
from  natural  image  edge  patches  by  finding  a  strong  edge, 
moving  it  to  the  center  of  the  patch  (20  x  20  pixels;  ten 
orientations/position)  and  then  rotating  so  that  it  is 
horizontal 

P(i,j|0)=P(fi  =  l,^,  =  l|fo  =  l)  (2) 

where  P(Eo)  =  1  denotes  a  horizontal  edge  at  the  origin. 
(Edges  are  isolated  by  enforcing  local  nonmaxima 
suppression  and  inhibiting  lateral  spread.)  Since  P(i, ]|0) 
is  positive  semidefinite,  edge  triples  can  be  visualized  by 
forming  an  embedding  based  on  the  eigenvectors  that 
diagonalize  the  matrix  [38] 

1=1 

where  the  eigenvectors  (fi  allow  a  spectral  embedding 


T>  maps  edges  to  points  in  an  embedded  space  where 
squared  distance  is  equal  to  relative  probability 

^(n)  =  ■■■,  \fK<t>n{i)Y  (3) 
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Fig.  5.  Display  of  third-order  edge  structure  showing  how  oriented  edges  are  related  to  their  spectral  embeddings.  (Top)  Spectral  embeddings. 
Since  the  spectrum  of  P{ij\0)  decays  rapidly,  edges  (points  in  illustration)  that  are  likely  to  co-occur  with  Eo  can  be  visualized  as  clusters 
(small  diffusion  distance).  Embedded  edges  are  plotted  in  {(1)2,  (t)4)  coordinates  and  colored  by  the  value  of  02, 03, 04  shown.  (Bottom)  Edge 
distributions  mapped  back  into  {x,  y)  and  again  colored  by  eigenfunctions.  (1)2  shows  linear  organization  and  (1)4  shows  a  curvature  organization. 
Compare  with  Fig.  4  where  red  edges  ail  have  high  probability  of  occurring  with  the  center,  but  no  information  is  known  about  their  co-occurrence 
probability.  Figure  after  [69]. 


In  this  space,  the  Euclidean  distance  between  embedded 
points  is  given  by  (see  also  [21]) 


||$(r0  -  $(r;)|f  =  ($(n),  $(n))  -  2($(r0, 

=  E  \£l\£,„  =  ll  -2E =  1 


The  first  and  last  terms  in  this  embedding  are  basically 
the  association  field:  the  edges  likely  to  occur  with  the 
center,  horizontal  edge.  The  middle  term  measures  the  co- 
occurrance  of  the  other  pairs;  in  other  words,  edges  Ej  and 
Ej  that  both  frequently  co-occur  with  a  horizontal  edge  at 
the  center  (see  Fig.  5).  These  include  straight  continua¬ 
tions  and  curves  with  positive  and  negative  curvatures.  In 


other  words,  high-order  edge  statistics  reflect  the  natural 
geometry  of  contours. 

In  summary,  whether  we  are  using  hidden  variables, 
priors,  or  synaptic  connections  is  determined  by  the 
inference  engine  employed.  In  all  cases,  these  variables 
represent  constraints:  constraints  between  neurons  at  the 
physiological  level  or  constraints  between  tokens  at  the 
scene  level.  Here  we  showed  that  there  is  significant 
higher  order  statistical  structure  to  edge  elements,  but  we 
had  to  develop  a  special  technique  to  reveal  it.  This  can  be 
viewed  as  a  learning  strategy.  Most  importantly,  it  revealed 
an  identification  with  geometrical  ideas,  which  we  take  as 
a  surrogate  to  working  with  very  high-order  statistics. 


C.  Overview  of  the  Paper 

Lighting  and  material  properties  combine  in  the  image 
formation  process:  even  simple  photometric  models 
involve  a  product  of  lighting  and  surface  albedo.  Such 
coupling  between  problems  has  been  addressed  in 
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computer  vision  as  a  series  of  coordinated  intrinsic  images 
[6],  [118],  [124].  The  intrinsic  images  model  has  its  roots  in 
Land  and  McCann’s  retinex  theory,  which  was  developed  to 
explain  color  constancy.  Retinex  is  based  on  the  idea  that 
sharp  (high-frequency)  variations  denote  material  changes 
and  slow  (low-frequency)  variations  denote  cast  shadows. 
The  modern  version  [118]  keeps  the  idea  that  properties 
(e.g.,  albedo)  are  scalar  fields  over  the  image  and  also 
characterizes  points  of  change  (large  image  derivatives).  As 
we  will  show,  there  is  significantly  more  to  differential 
structure  than  that  involved  in  boundaries,  and  significantly 
more  to  color  variations  than  abrupt  material  versus  gradual 
shadow  edges.  This  will  elaborate  the  notion  of  geometrical 
models  introduced  above,  and  will  take  the  form  of  different 
flows — essentially  vector  fields — defined  along  curves  and 
within  shaded  or  colored  regions. 

Flows  are  related  to  differential  equations.  Shape  from 
shading  typically  involves  a  partial  differential  equation 
(PDF)  to  be  solved  over  a  global  surface  from  boundary 
information  [45]  using  smoothness  constraints;  similar 
smoothness  constraints  have  been  postulated  for  stereo 
(e.g.,  [20],  [79],  and  [96]).  We  develop  a  different  path: 
biology  suggests  looking  “locally,”  and  we  show  that  some 
parts  of  the  shape-from-shading  problem  are  inherently 
less  ambiguous  than  others.  Hence,  there  could  be  a  real 
advantage  to  “locking  down”  certain  parts  of  the  solution 
and  then  interpolating  others.  It  is  a  little  like  doing  a 
puzzle:  start  with  those  pieces  about  which  you  are 
certain  and  then  use  constraints  to  fit  nearby  pieces 
together  with  them.  Just  as  neurons  are  connected  into 
networks,  problems  such  as  these  (and  their  decomposi¬ 
tions)  imply  networks  of  local  problems  that  can  be  fitted 
together  [78],  [83],  [103]. 

II.  EARLY  INFERENCE  PROBLEMS 

Early  biological  vision  often  connotes  boundary  detection 
and  segmentation  to  computer  vision  researchers,  because 
the  first  cortical  visual  area,  VI,  contains  neurons  selective 
for  (a  sampling  of)  all  orientations  at  every  retinotopic 
position  [Fig.  6(a)  and  (b)].  It  is  thought  that  these  are 
local  edge  detectors.  Taken  together  we  have  a  columnar 
model  [48]  that  suggests  an  identification  with  the 
geometry  of  fiber  bundles.  We  start  with  such  models  to 
set  the  stage,  and  then  move  to  stereo  and  shading  analysis. 

A.  Contour  Geometry 

Visual  cortex  in  primates  provides  a  rich  substrate  for 
realizing  networks  of  orientationally  selective  neurons  that 
could  implement  the  high-order  statistical  constraints  just 
described  (Fig.  2,  bottom).  Orientation  selectivity  begins 
in  layer  4  [82],  [113];  there  is  a  substantial  projection  to 
the  upper  levels  [19],  [25]  that  is  associated  with  boundary 
processing  [2].  Anatomical  studies  reveal  that  these 
intrinsic  connections  are  clustered  [37]  and  orientation 
dependent  [15],  leading  many  to  believe  that  consistent 


firing  among  neurons  in  such  circuits  specifies  the 
orientations  along  a  putative  contour  [32],  [52],  [128]. 
Random  fields  and  neural  networks  are  all  about  using 
context  (e.g.,  along  the  contour)  to  remove  noisy  responses 
that  are  inconsistent  with  their  neighbors’  responses  or  to 
reinforce  weak  or  missing  responses.  How  might  con¬ 
straints  Cij  be  designed  for  such  a  task?  Do  they  resemble 
third-order  edge  statistics? 

We  apply  this  machinery  to  contour  detection  in  Fig.  6 
following  [9].  Fig.  6(b)  shows  how  neurons  form  circuits 
with  long-range  horizontal  connections  [3],  [15],  [100]. 
Activity  in  such  circuits  can  be  interpreted  geometrically 
[Fig.  6(c)]:  viewing  orientationally  selective  responses  as 
signaling  local,  linear  approximations  to  a  contour, 
suggests  interpreting  them  as  signaling  tangents  to 
contours.  Mathematically,  a  tangent  can  be  transported 
along  an  approximation  to  the  curve  (indicated  as  the 
osculating  circle)  to  a  nearby  position.  Compatible 
tangents  are  those  that  agree  with  sufficient  accuracy  in 
position  and  orientation  following  transport;  this  is  the 
cocircularity  approximation  [89].  In  (position,  orienta¬ 
tion)  space  [Fig.  6(d)],  a  length  of  circle  in  the  image  lifts 
to  a  length  of  helix  in  (x^y^O).  Identifying  this  diagram 
with  the  one  above  it  shows  that  the  transport  operation 
need  not  be  carried  out  mathematically;  it  can  be 
embedded  in  the  long-range  connections.  Projection  into 
the  image  plane  of  these  connections  indicates  either 
straight  [Fig.  6(e)]  or  curved  [Fig.  6(f)]  patterns.  In 
biology,  such  connections  are  called  projective  fields  [72]. 
Returning  to  (1),  these  are  the  c^-,  for  i  denoting  diagonal 
in  the  center  and  j  denoting  another  edge.  The  superscript 
K  indicates  that  these  are  a  function  of  the  curvature;  cf., 
the  clusters  of  third-order  edge  structure  (Fig.  5). 

Algorithmically,  we  can  use  these  connections  by 
elaborating  the  index  in  (1)  to  include  curvature: 
i  =  (xi,yi,  /^i).  The  gradient  ascent  in  energy  is  then  as 
follows. 

Given:  connections  {cij}  and  initial  probability 
estimates  {p®}  for  each  discretized  position,  orientation, 
and  curvature.  Update:  the  probability  estimates  (until 
convergance)  by 


(4) 

(5) 

where  77  is  a  step  size  and  fl  ^  projection  operator  onto 
the  probability  simplex  (necessary  to  keep  0<p"<  1  and 
appropriate  ~  Consistency  in  firing  according  to 

patterns  would,  of  course,  reduce  noisy  responses  implying 
an  increase  in  firing  sparsity  [120]. 

In  addition  to  the  connections  intrinsic  to  VI,  there  are 
feedforward  projections  from  layers  2/3  to  higher  visual 


VoL  102,  No.  5,  May  2014  |  Proceedings  oe  the  IEEE  817 


Zucker:  Stereo,  Shading,  and  Surfaces:  Curvature  Constraints  Couple  Neural  Computations 


(a) 


(b) 


Ideal  models  lifted  to  x  S' 


(e) 


(f) 


Fig.  6.  Columnar  organization  of  visual  cortex,  (a)  A  group  of  cells  selective  for  different  orientations  at  about  the  same  location  in  the  visual  field, 
(b)  This  column  of  cells  Is  rearranged  In  (position,  orientation)  coordinates.  (Long  range  horizontal)  connections  between  cells  relate  an 
orientation  signal  6  at  position  (x,  y)  to  another  orientation  6'  at  {x\  y').  (c)  If  each  cell  signals  a  tangent  to  a  contour,  then  transport  along  the 
contour  can  reveal  consistency  among  nearby  tangents,  (d)  Using  the  osculating  circle  as  a  local  approximation  to  the  curve,  transport  over 
short  distances  In  (x,  y,  0)  Is  movement  along  a  helix.  By  identification  with  (b),  these  helices  are  a  model  for  the  horizontal  connections. 

They  are  a  function  of  curvature,  either  straight  (e)  or  curved  (f).  Figures  after  [9]. 


areas  [3],  [88],  [112].  V2,  for  example,  has  an  elaborate 
organization  into  subzones,  including  the  thin,  thick,  and 
pale  stripe  areas  [102].  It  is  thought  these  participate  in 
stereo  and  color  computations,  to  which  we  will  turn 
shortly.  There  are  also  feedback  projections  from  higher 


visual  areas  [3],  [101].  Since  receptive  fields  are  larger  in 
higher  areas,  this  could  involve  contour  computations  over 
a  larger  scale  [128]. 

Differential  geometry  specifies  how  orientations  align 
along  a  contour.  Following  [87],  let  /?  :  I  ^  with 
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Fig.  7.  Discontinuities  in  (x,  y,  0)-space  are  represented  by  muitipie  orientations  at  the  same  iocation,  (a)  Image  of  a  Klein  bottle  with 
edges  (b).  Such  edges  signal  monocular  occlusion  events,  when  lifted  into  (x,  /,  9)-space  there  are  multiple  values  at  a  position,  shown  (c)  by  tilting 
(x,  y,  oyspace  so  that  the  fibers  are  at  an  angle. 


||/?'(s)||  =  l,s  G  I  denote  the  unit-speed  curve  defined  by 
the  differentiable  map  from  the  interval  I  into  Euclidean 
2-space.  The  unit  tangent  is  T  =  /3\  from  which  we  get 
T'  =  [3" ,  the  curvature  vector  field.  Observe  that  T'  is 
orthogonal  to  T  (just  differentiate  T  ■  T  =  1).  The  direction 
of  the  curvature  vector  is  normal  to  (3,  and  its  length 
=  ||T'(s)||  defines  the  curvature.  The  vector  field 
N  =  T' / n  defines  the  principal  normal. 

The  Frenet  frame  field  on  (3  is  the  pair  (T,  N)  such  that 
T  ■  T  =  N  ■  N  =  1,  all  other  dot  products  =  0,  and  the 
above  conditions  hold.  The  elegance  of  cortical  geometry 
derives  from  the  fact  that  derivatives  of  the  frame  can  be 
expressed  in  terms  of  the  frame  itself.  For  k,  >  0,we  have 


T' 

N' 


(6) 


The  lift  from  the  image  into  cortical  coordinates 
[Fig.  6(b)  and  (d)]  reveals  a  rich  connection  to  Gestalt 
principles  [121].  Good  continuation  [125]  for  curves — that 
slow  changes  in  orientation  should  be  preferred  to 
sudden,  abrupt  ones — has  a  special  realization  in  (x,y,  O)- 
coordinates:  at  the  crossing  point  of  a  figure  “8”  are  two  line 
orientations — tangents — ^but  these  are  separated  along  the 
column — the  fiber — of  orientations.  Good  continuation 
means  that  there  is  no  big  jump  along  a  fiber;  the 
connections  to  nearby  tangents  are  “shorter”  by  passing 
through  the  junction.  The  nonsimple  curve  in  the  plane 
becomes  a  simple  curve  in  (x,y,  ^).  The  contact  geometry 
for  this  has  been  worked  out  [106];  see  also  [126]. 

Discontinuities  are  a  different  story,  however  (Fig.  7). 
Now  multiple  orientations  at  the  same  position  signal  what 
often  amounts  to  a  monocular  occlusion  event  [13],  [128]; 
a  contour  ending  can  signal  a  cusp  [68]. 

Before  moving  on,  we  draw  a  lesson  from  the  columnar 
organization.  The  column  is  a  representational  architecture 
that  contains  each  possible  curve  tangent  at  every  position; 


the  bundle  of  columns  contains  every  possible  curve.  This 
architecture  will  be  repeated  for  other  problems. 


B.  Texture  and  DTI 

Orientation-defined  textures  [11],  [53],  [98],  [114]  arise 
when  oriented  elements  are  dense  in  two  directions  rather 
than  one,  in  effect  weaving  edges  together  into  a  tapestry. 
Again  the  orientation  column/fiber  bundle  structure  works 
ideally  to  represent  such  patterns,  and  again  there  is  a  high- 
order  curvature  dependency.  The  mathematics  are  general¬ 
ized,  with  the  Frenet  curvature  replaced  by  a  Cartan 
connection  form  [87]  (Fig.  8).  The  form  at  each  location  is 
denoted  (Et,En)  and  transport  is  generalized  from  tangen¬ 
tial  motion  along  a  streamline  to  the  entire  tangent  plane. 
This  will  lead  to  richer  projective  fields. 

The  transport  equations  are  analogous  to  the  curve 
case,  except  now  it  is  possible  to  move  the  frame  in  any 
(tangent  plane)  direction  rather  than  only  along  a  contour. 
This  requires  the  use  of  covariant  derivatives  rather  than 
standard  ones,  and  a  one-form  w  for  the  curvature 


/  VvEt  ^ 

\  V  vEn  J 


0  W12(V) 

-W12(V)  0 


(7) 


The  Cartan  connection  equations  resemble  the  Frenet-Serret 
formulas  but  involve  the  connection  form  u;i2(V).  Such  forms 
“take”  a  vector  as  “input”  and  “output”  a  scalar.  Just  as  surface 
curvature  can  be  expressed  in  terms  of  principal  curvatures, 
for  general  oriented  patterns  there  are  two  basic  curvatures 


tangential  curvature:  Kt  =  Wi2(F7) 

normal  curvature:  Kn  =  ri;i2(F^).  (8) 


Psychophysically,  we  are  sensitive  to  these  curvatures 
[8],  [12],  [52],  [84].  Knowledge  of  Et,  En,  at  a  point 

(^0:)^o)  allows  us  to  develop  an  osculating  flow  field 
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Fig.  8.  Cartan  connections  define  the  curvature  structure  and  connection  patterns  in  orientation-defined  textures.  The  rotation  can  now  be  in  any 
direction  in  the  tangent  piane.  (a)  Dispiacement  in  the  V -direction  yieids  a  rotation  of  the  frame  according  to  the  covariant  derivative  Vi/> 

(b)  Dispiacement  in  another  direction  V.  Note  the  rotation  is  different,  (c)  Excitatory  connections  between  neurons  for  kt  =  kn  =  O, 

(d)  kt  =  0.2,  kn  =  O,  (e)  kt  =  0.2,  kn  =  0.2,  Figure  after  [llj. 


analogous  to  the  osculating  circle  in  cocircularity,  and  the 
right  helicoid  has  several  natural  properties.  Letting  0(x^y) 
denote  the  field  of  orientations  around  (xo,yo) 


0{x^y)  =  tan 


1  “h  Kj^X  —  i^Ty 


(9) 


The  long-range  horizontal  connections  could  again  imple¬ 
ment  them  as  projective  fields  [9].  [Compare  Fig.  8(c)-(e) 
with  Fig.  6(e)  and  (f).]  The  importance  of  the  (x^y^O) 
representation  is  further  illustrated  with  nonsimple 
patterns,  such  as  crossing  textures;  see  Fig.  9. 

The  helicoid  can  be  generalized  to  an  orientation  field 
in  a  volume  from  one  in  the  plane,  and  has  been  used  for 
applications  such  as  modeling  hair  patterns  [92].  This 
illustrates  the  serendipity  that  can  be  achieved  with 
mathematical  models:  although  we  began  with  cortical 
connections  in  mind,  generalizations  have  arisen  to  other 
anatomical  applications.  Many  of  these  are  triggered  by  the 
development  of  new  imaging  technologies  such  as 
diffusion  MRI  or  diffusion  tensor  imaging  (DTI).  This 
technology  is  able  to  image  the  diffusion  of  water 
molecules  in  biological  tissues,  such  as  white  matter  fibers 
in  the  brain.  Because  many  of  these  fiber  tracks  cross, 
regularization  must  be  conducted  “along”  the  fibers  and 


not  between  them  [73].  The  geometry  illustrated  in  Fig.  9 
illustrates  precisely  this. 

Another  geometrically  related  application  is  to  the 
arrangement  of  myofibers  in  the  heart  wall  [107];  see 
Fig.  10.  Individual  myofibers  have  the  form  of  helices  and 
shorten  in  length  during  contraction.  The  generalized 
helicoid  model  extends  from  fibers  to  distributions  of 
fibers,  in  particular  providing  optimal  volume  change 
without  tangling. 


Fig.  9.  Crossing  textures  separate  in  the  {x,y,e)  representation. 
These  are  anaiogous  to  crossing  fiber  tracks  in  brain  imaging. 
For  motion  anaiysis  in  computer  vision  these  are  caiied  iayered 
representations  [123], 
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Fig.  10.  The  left  ventricle  In  the  heart  Is  surrounded  bymyofibers  that 
provide  contractile  strength.  Each  Individual  fiber  follows  a  helical 
geometry;  the  ensemble  of  fibers  Is  arranged  as  a  generalized  helicoid. 
The  Instantaneous  **angle**  of  each  fiber  rotates  smoothly  as  It  wraps 
around,  and  also  varies  smoothly  across  fibers  moving  from  the 
exterior  to  the  Interior  of  the  ventricle  wall.  Figure  adapted  from  [107], 


III.  SURFACES  AND 
INTERMEDIATE-LEVEL  VISION 

Early  (2-D)  vision  was  based  on  the  lift  of  image  properties 
into  (position,  orientation  column)  organizations.  Such 
organizations  have  natural  “good  continuation”  properties, 
with  curvature  relating  nearby  orientations.  We  now 
consider  surfaces  and  3-D  inferences.  These  naturally 
involve  products  of  earlier  representations. 

A.  Orientation-Based  Stereo  Correspondence 

Stereo  infers  depth  by  integrating  the  different  images 
striking  our  eyes.  It  begins  in  VI,  where  cells  exist  that  are 
selective  to  positional  or  phase  shifts  in  Gabor-like 
receptive  fields  [97].  This  positional  disparity  is  not  all  of 
the  story,  however:  it  must  be  integrated  over  larger 
distances  to  yield  a  consistent  depth  percept.  Evidence  of 
recurrent  computation  is  now  appearing  [104],  [105],  in 
analogy  with  curve  inferences.  How  might  such  recurrent 
computations  be  structured? 

Almost  all  disparity  selective  neurons  in  VI  are  also 
orientationally  selective  [93].  The  second  visual  area,  V2, 
is  also  very  rich  in  disparity  processing  with  orientationally 
selective  cells  [102].  Therefore,  we  ask  how  positional 
differences  and  orientation  could  combine  in  stereo 
correspondence. 

To  understand  how  Euclidean  space  and  cortical 
coordinates  relate,  consider  the  border  of  an  object  as  a 
space  curve  in  3-D.  Eor  image  boundaries,  we  studied  good 
continuation  in  2-D;  now  we  will  study  good  continuation 
in  the  world.  But  this  is  not  what  is  given;  it  is  what  is 
sought.  We  start  with  a  pair  of  images,  one  to  the  left  eye 
and  one  to  the  right  which,  in  visual  cortex  (following  the 
previous  abstraction),  amounts  to  columnar  representa¬ 


tions  of  boundary  tangents  in  the  left  and  right  images 
(Eig.  11).  The  method  for  putting  them  together  follows 
[74].  We  move  beyond  spatial  disparity  to  determine  which 
tangent  in  the  left  image  goes  with  which  tangent  in  the 
right  image.  This  is  the  correspondence  problem. 

Eor  2-D  boundaries  tangents  were  transported  along 
cocircular  appoximations  to  establish  consistency.  Orthog¬ 
onal  to  the  tangent  was  the  normal  vector.  The  situation  in 
3-D  is  conceptually  the  same  (Eig.  11),  except  now  the 
tangent  vector  is  a  3-D  vector  and  the  full  geometry  is 
captured  by  transporting  a  (tangent,  normal,  binormal)  or 
(T,  N,  B)  frame.  Again  “curvatures”  connect  frame  com¬ 
ponents.  Torsion,  a  kind  of  curvature  out  of  the  osculating 
(T,  N)  plane,  is  the  second  rotation  [87]. 

We  now  develop  tangent  correspondence  between  the 
left  and  right  images  by  first  considering  the  forward 
problem.  The  (T,  N,  B)  frame  at  a  point  along  a  space  curve 
in  3-D  projects  to  a  pair  of  2-D  (T,  N)  frames  [Eig.  11(d)].  In 
general,  these  2-D  frames  are  different.  Their  points  of 
attachment  in  image  coordinates  will  be  displaced;  this  is 
the  spatial  disparity.  But  just  as  importantly,  their  angles  will 
be  different;  this  is  orientation  disparity.  All  of  this  structure 
derives  only  from  the  projection  of  a  single  frame. 

Solving  stereo  correspondence  is  an  inverse  problem: 
find  those  pairs  of  (left,  right)  tangents,  such  that  the 
resultant  3-D  tangent  can  be  inferred.  This  inverse 
problem  is  inherently  ambiguous  in  the  same  way  that 
the  2-D  curve  inference  problem  was  ambiguous,  so  we 
solve  the  3-D  problem  in  an  analogous  fashion.  Good 
continuation  for  2-D  curves  came  from  transporting  a 
tangent  via  cocircularity  and  reinforcing  those  that  agreed. 
In  3-D,  a  single  tangent  projects  into  each  of  the  two  image 
planes.  Moving  slightly  along  the  3-D  space  curve  again 
requires  an  approximation;  in  this  case,  a  short  piece  of  a 
helix  generalizes  the  2-D  osculating  circle.  Now,  consid¬ 
ering  a  second  (3-D)  tangent  slightly  further  along  the 
space  curve  from  the  first  one,  it  will  project  to  another 
pair  of  tangents  [Eig.  11(e)].  Thus,  the  stereo  problem  is 
solved  by  determining  which  tangent  pairs,  when 
transported  along  a  helix,  match  which  other  pairs.  This 
is  how  the  results  in  Eig.  11(c)  were  obtained. 

The  machinery  to  implement  this  computation  could  be 
formulated  as  a  set  of  neural  connections,  perhaps  realized  in 
the  VI  ^  V2  projection,  within  V2,  or  in  higher  areas.  A 
major  constraint  that  derives  from  this  model  is  that  the 
accuracy  at  which  orientation  is  represented  needs  to  be 
sufficient  to  support  orientation  disparity  estimates;  perhaps 
this  explains  why  the  stereo  task  is  relegated  to  higher  visual 
areas.  There  exists  evidence  that  such  responses  are 
available  by  V4  [40]  and  psychophysics  supports  (at  least) 
colinear  facilitation  in  depth  [47].  Moreover,  rivalry  results 
when  nonmatching-oriented  patterns  are  used  [51]. 

As  with  2-D  curves,  the  good  continuation  approach  to 
solving  stereo  correspondence  for  space  curves  relies  on 
curvatures.  Another  leap  is  required  when  stereo  for 
surfaces  is  considered  (Eig.  12).  Now,  instead  of  a  tangent 
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Fig.  11.  The  stereo  correspondence  problem  for  space  curves,  (a)  and  (b)  A  left-right  Image  pair  demonstrating  that  structure  may  appear 
In  a  different  ordering  when  projected  Into  the  left  and  right  eyes  (highlighted  box),  (c)  Color-coded  depth  Inferred  along  the  tree  branches. 
Note  how  It  varies  smoothly  along  a  branch  but  abruptly  between  branches,  (d)  Geometrical  setup:  the  spiral  curve  In  3-D  projects  to 
two  Image  curves.  Points  along  the  space  curve  have  (T,  N,  B)  associated  frames,  while  the  2-D  curves  have  (T,  N)  frames.  Notice  how  a  tangent 
to  the  space  curve  projects  to  a  pair  of  (2-D)  tangents,  one  In  the  left  Image  and  one  In  the  right  Image,  (e)  Stereo  correspondence  between 
pairs  of  (left-right)  pairs  of  tangents.  Figure  after  [74]. 
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Fig.  12.  Stereo  for  surfaces.  The  surface  normal  N(p)  varies  smoothly 
and  generally  differs  from  nearby  normal  N  (q).  Each  Is  orthogonal  to 
the  tangent  plane  (e.g.,  Tp  at  that  point.  Moving  an  Infinitesimal 
distance  along  the  curve  connecting  p  and  q  Induces  a  small  rotation 
In  the  normal  (or,  equivalently  In  the  tangent  plane);  this  rotation 
Is  a  type  of  surface  curvature  In  that  direction.  Taking  all  possible 
directions  Into  account  yields  the  shape  operator,  another  curvature 
form.  Figure  after  [75 J. 


to  a  surface,  there  is  a  tangent  plane,  and  it  rotates 
depending  on  the  direction  in  which  it  is  transported.  To 
build  intuition,  consider  slicing  an  apple:  for  every 
direction  in  which  the  knife  is  pointed  (the  direction  of 
transport)  a  different  cut  (surface  curve)  is  made.  Each  cut 
defines  a  curvature,  which  specifies  how  the  surface 
normal  varies  as  it  is  transported  in  different  directions 
(the  shape  operator).  Details  for  how  to  solve  the  stereo 
problem  for  surfaces  can  be  found  in  [75].  Now,  we  turn  to 
another  way  to  get  surface  information:  shading  analysis. 

B.  Orientation-Based  Shape  From  Shading 

Ernst  Mach  may  have  been  the  first  to  formulate  a 
shape-from-shading  inference  problem  as  a  PDE  [99],  a 
tradition  taken  up  with  enthusiasm  in  computer  vision 


[46].  Typically,  one  seeks  a  map  from  image  intensities  to 
some  representation  of  the  surface  (usually  surface 
normals)  under  a  given  shading  model  (usually  Lamber¬ 
tian).  Various  ways  to  formulate  the  PDEs  [23],  [77],  [85] 
or  regularization  conditions  [96]  have  been  proposed. 

Ambiguity  arises  at  several  levels.  Even  with  a  simple 
Lambertian  model,  many  different  surface  normals  could 
account  for  a  given  image  intensity  given  a  light  source; 
and  in  general  there  are  many  possible  light  sources  [67]. 
Perhaps  the  most  common  solution  is  to  place  a  global 
prior  on  the  light  source  [33];  or  an  assumption  on  the 
class  of  surfaces  [72],  [91];  or  to  try  to  estimate  the  source, 
albedo,  and  shape  simultaneously  [5],  [127].  At  the  base  is 
a  global  bas-relief  ambiguity.  In  general,  there  is  a  deep 
sense  of  frustration  around  this  problem,  exacerbated  by 
the  fact  that  we  “seem”  to  be  able  to  do  it  so  easily 
(although  this  is  in  part  an  illusion  [29],  [58],  [59]). 

In  seeking  ways  that  our  brains  could  infer  shape  from 
shading,  we  begin  not  with  the  image  but  with  how  the 
image  would  be  represented  in  visual  cortex  (Pig.  13). 
Ideally,  cells  tuned  to  low  spatial  frequencies  will  respond 
maximally  when,  e.g.,  the  excitatory  receptive  field 
domain  is  aligned  with  brighter  pixels;  the  inhibitory 
domain  of  an  oriented  receptive  field  will  then  align  with 
the  darker  regions.  These  maximally  responding  cells 
define  the  shading  flow  field  in  cortical  space  [17];  it  is  the 
tangent  map  to  the  image  isophotes  [57]. 

Working  with  the  shading  flow  removes  some 
ambiguity — it  is  invariant  to  arbitrary  monotonic  intensity 
transformations  [56] — and  it  reduces  image  noise.  But  the 
biologically  motivated  algorithms  with  which  we  have 
been  working  suggest  a  more  radical  advantage:  consider 
the  shading  flow  as  a  vector  field,  or  section  through  the 
bundle  of  possible  shading  flows,  and  apply  the  machinery 
of  differential  geometry  to  it.  This  research  program  is 
being  carried  out  now  [63] -[65],  and  we  report  current 
progress  in  it. 


Fig.  13.  Representation  of  shading  Information  In  visual  cortex.  Cells  with  oriented  receptive  fields,  tuned  to  low  spatial  frequencies,  will  respond 
optimally  when  aligned  along  Isophotes,  or  contours  of  constant  brightness.  Activity  In  (x,  y,  9)  space  Is  thus  the  tangent  map  to  these 
Isophotes— the  shading  flow  field.  This  Is  analogous  to  the  lift  of  oriented  textures.  Figure  after  [66]. 
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Fig.  14.  Geometry  of  motion  through  a  shading  flow  field.  Moving 
along  the  curve  Pi  (f),  from  PO  to  PI  in  the  isophote  direction  v,  implies 
that  the  flow  field  V{x,  y)  changes  byV^V.  Moving  in  direction  u  along 
Pzis),  which  is  perpendicular  (in  the  image)  to  the  isophote  causes 
the  flow  field  to  change  hyV^V.  These  changes  can  be  formally  related 
to  the  surface  curvatures  and  the  light  source  direction. 


Corresponding  to  the  shading  flow  is  an  illuminated 
surface  and,  generalizing  from  earlier  ideas  about  trans¬ 
port,  the  trick  is  to  analyze  what  happens  on  the  surface  as 
you  move  through  the  shading  flow  field  (Fig.  14).  Walking 


in  the  direction  of  a  tangent  corresponds  to  walking  along 
an  isophote  on  the  surface.  According  to,  e.g.,  Lambertain 
reflectance,  the  tangent  plane  has  to  rotate  precisely  so  the 
brightness  remains  constant.  Or,  moving  normal  to  the 
shading  flow  says  the  brightness  gradient  must  be  changing 
in  another  fashion.  Together,  these  constraints  on  the  flow 
changes  correspond  to  changes  in  the  surface  curvatures 
and  result  in  a  system  of  differential  equations  that  can  be 
solved  in  certain  circumstances.  Apart  from  bas-relief 
ambiguity,  they  reveal  a  family  of  possible  surface  patch/ 
light  source  combinations  for  each  patch  of  shading  flow. 
These  patches  include  the  classical  bas-relief  “cup”  versus 
“bump”  ambiguity,  plus  a  number  of  twisted  ones  [64]. 

Putting  the  possible  patches  together  suggests  finding  a 
section  through  a  more  complex  bundle  than  previously 
reviewed  (Fig.  15).  Some  boundary  conditions  are  available 
to  select  from  among  these,  for  example,  the  manner  in 
which  surfaces  curve  as  they  approach  a  boundary  [49], 
[55],  but,  in  general,  this  is  not  sufficient  to  reduce 
ambiguity  to  bas-relief. 

Having  developed  the  differential  equations  that  allow 
calculation  of  surfaces  from  shading  flows  permits  another 
type  of  analysis:  one  can  ask  for  which  features  is  the 
ambiguity  minimal  (Fig.  16).  This  turns  out  to  be  not  just 
around  certain  boundary  conditions  but  also  for  ridges  and 
related  structures.  We  conjecture  that  this  is  the  reason 
why  shading  analysis  appears  to  work  so  well — it  is  rather 
nicely  defined  in  certain  circumstances — and  may  clarify 
why  certain  boundaries  are  important  in  viewing  art  and 
drawings  [22].  When  ambiguity  is  extensive,  almost  all 
reasonable  prior  assumptions  will  be  questionable,  so 
perhaps  shading  analysis  should  not  even  be  attempted. 
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Fig.  15.  Inference  of  shape-from-shading  inf ormation  as  a  problem  in  perceptual  organization.  For  each  patch  of  the  shading  flow  field  there 
is  a  family  of  possible  surfaces;  this  family  is  a  kind  of  column  of  possibilities  analogous  to  the  orientation  column  in  early  visual  cortex. 
Selecting  from  among  these  families  according  to  boundary  and  interior  conditions  reveals  a  surface  just  as  selecting  orientations  reveals  a 
contour.  Figure  after  [66], 
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Fig.  16.  Shape  inferences  are  highly  constrained  in  certain  neighborhoods  of  a  shape,  and  less  at  others,  (a)  A  shaded  surface  is  well  constrained 
at  (b)  highlight  points  and  (c)  along  boundaries  and  ridges,  (d)  Zooming  in  on  a  ridge,  the  red  line  defines  a  normal  plane.  Taking  a  cross 
section  along  this  plane  shows  (e)  how  the  second  derivative  of  intensity  along  the  line  constrains  possible  cross  sections.  Note  that  the  tangent 
plane  changes,  (f)  Various  cross  sections  and  associated  light  sources  with  the  tangent  plane  fixed;  the  projected  light  source  hardly  changes. 
These  two  types  of  transformations  characterize  the  possible  cross  sections  and  illustrate  how  constrained  they  are.  Figure  after  [65], 


Instead,  a  solution  could  be  interpolated  across  the 
ambiguous  positions  and  anchored  by  minimal  ones.  This 
interpolation  could  be  accomplished  by  the  manner  in 
which  shape  is  represented  in  higher  visual  areas  [90]. 

C.  Orientation-Based  Color  Processing 

While  shading  inferences  were  naturally  expressed  in 
differential  geometric  terms,  color  would  seem  to  be  very 
different.  Typically,  one  thinks  of  the  short-medium-long 
wavelength  retinal  cones  and  the  single  opponent  processing 
in  retinal  gangion  cells  [Fig.  17(a)].  Such  opponency  is 
readily  characterized  by  efficient  coding  principles  [111].  But 
something  rather  different  emerges  when  nonlinear  dimen¬ 
sionality  reduction  techniques  are  used  [Fig.  17(c)].  Munsell 
patches  can  be  viewed  as  a  collection  of  points  in  wavelength 
space.  When  this  is  projected  by  diffusion  maps  [21]  to  three 
coordinates,  the  intensity-hue-saturation  representation 
emerges  [10].  Now,  attaching  a  unit  vector  to  each  image 
position  defines  a  flow  in  hue.  Such  flows  have  arisen  in 
image  denoising  and  in  painting  applications  [117]. 

How  might  these  hue  flows  be  realized  in  primate 
visual  cortex?  There  is  a  rich  representation  of  color 
information  in  the  form  of  oriented  double-opponent  cells 
[109],  shown  in  Fig.  17(b).  Just  as  the  receptive  fields 


shown  in  Fig.  1(b)  provided  an  oriented  contrast 
measurement,  one  can  also  characterize  oriented  color- 
contrast  measurements.  These  would  be  Gabor-like  filters 
with  (say)  red-green  subzones  rather  than  dark-light 
ones.  Visual  cortex  goes  one  step  further,  however:  double- 
opponent-oriented  receptive  fields  with  red-green-oriented 
opponency  contrasted  with  green-red-oriented  opponency. 
There  are  also  oriented  blue-yellow  double-opponent  flows. 
These  oriented  double-opponent  flows  relate  to  the  infor¬ 
mation  processing  questions  that  we  considered  in  the 
Introduction  (Fig.  18). 

The  variation  of  pigment  across  the  surface  of  a  fruit 
suggests  another  type  of  ambiguity  in  images  even  more 
primitive  than  those  considered  in  Section  III-B:  which 
brightness  variations  correspond  to  shading  variations  and 
which  to  material  changes.  Interpreting  pigment  variations 
as  shading  variations  would  lead  to  huge  shape  errors. 

Color  and  brightness  variations  are  correlated  on 
surfaces,  which  suggests  checking  for  this  [110].  While  it 
can  be  done  locally  or  at  edge  points  [118],  the  flow 
structure  is  even  richer:  it  exists  across  surfaces.  Following 
the  cue  in  shading  analysis,  we  seek  isohue  flows.  These 
are  naturally  expressed  in  the  red-green/blue-yellow- 
oriented  double-opponent  basis  [43];  see  Fig.  19.  Most 
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Fig.  17.  Representation  of  color,  (a)  The  retina  and  lateral  geniculate 
exhibit  circular  surround  receptive  fields  that  are  single  opponent  In 
brightness,  red-green,  and  blue-yellow,  (b)  In  visual  cortex,  cells 
exhibit  oriented,  double-opponent  receptive  fields,  (c)  The 
Intensity-hue-saturatlon  representation.  In  which  hue  lies  on  a  circle. 
Nonlinear  embeddings  of  M unsell  patches  reveal  this  representation, 
shown  In  (d)  side  and  (e)  top  views.  Natural  objects  (f)  are  rich  In 
color  variation,  as  shown  In  the  hue  flow  (g).  Figure  (b)  after  11091. 
Figures  (c)-(f)  after  [10]. 


importantly,  when  the  isophote  and  the  isohue  flows  are 
parallel,  it  means  they  are  covarying  over  a  region;  this  is 
highly  unlikely  to  occur  naturally  unless  they  have  a 
common  source  such  as  pigment  variation.  On  the  other 
hand,  when  the  flows  are  transverse,  it  implies  that 
structure  is  developing  differently  over  a  region.  In  this 
latter  case,  the  brightness  information  can  be  interpreted 
as  shading. 


Fig.  18.  Color  In  cortex,  (a)  The  model  In  Fig.  6  can  be  generalized  to 
postulate  both  red-green  and  blue-yellow  double-opponent  columns. 
These  provide  a  natural  frame  basis  for  Isohue  flows. 
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Fig.  19.  Image  of  a  mango  showing  the  Interaction  of  hue  and 
brightness,  (a)  The  shading  flow  and  (b)  the  Isohue  flow,  when 
compared  In  the  highligh  ted  region.  It  Is  clear  that  In  some  locations  the 
flows  are  parallel.  Indicating  a  material  event,  and  In  others,  the  flows 
are  transverse.  Indicating  that  the  brightness  variation  can  be 
Interpreted  as  shading.  Figure  after  [43]. 


A  psychophysical  demonstration  illustrates  how  com¬ 
pelling  flow  interaction  can  be  (Fig.  20).  Two  colored 
versions  of  a  shaded  image  were  created  by  adding 
isoluminent  color  images:  one  with  an  isohue  flow  parallel 
to  the  shading  flow  and  the  other  transverse  to  the  shading 
flow.  In  the  aligned,  parallel  case,  the  depth  relief  is 
reduced,  even  though  the  brightness  distribution  remains 
unchanged.  It  is  as  if  this  specific  color  pattern  masks  that 
depth  effect,  thus  providing  a  new  role  for  color  perception 
different  from  shadow  and  boundary  detection. 


IV.  CONCLUSION 

Neural  circuitry  has  inspired  generations  of  biologically 
motivated  computer  vision  algorithms.  Beginning  with  the 
identification  of  receptive  fields  with  edge  operators,  many 
of  the  ingredients  of  computer  vision  classes  are  the  same 
as  the  ingredients  of  visual  perception  classes.  While  this 


Aligned 


Unaligned 


Fig.  20.  Combining  color  and  shading  Inf ormatlon.  The  gray-level 
shaded  figure  has  two  different  Isolumlnant  color  images  added  to  It. 
In  the  aligned  case,  the  shading  flow  and  the  Isohue  flow  are 
parallel  and  the  depth  relief  seems  to  disappear;  In  the  unaligned 
case,  the  color  Information  appears  ** painted**  onto  the  surface. 
Figure  after  [43]. 
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shows  an  intuitive  identification  of  vision  algorithms  with 
visual  processes,  the  intuition  is  difficult  to  realize  with 
productive  systems. 

We  adopted  a  much  more  limited  view  in  this  review, 
based  on  the  centrality  of  orientation  fields  in  both  neural 
modeling  and  differential  geometry.  The  analogy  was 
established  around  the  bottom-up  boundary  detection 
problem,  and  developed  into  stereo,  shading,  and  color. 
The  advantage  for  stereo  was  parallel  realization  of  spatial 
and  orientation  disparity  in  computing  stereo  correspon¬ 
dence.  The  advantage  for  shading  was  the  pullback  of 
transport  operations  on  the  shading  flow  to  reveal  curvature 
forms  on  the  surface.  Finally,  the  advantage  for  color  was 
uncovering  a  role  for  isohue  flows  in  a  primitive  discrim¬ 
ination  between  surface  and  material  changes. 

These  results  are  concrete  and  can  be  put  into  practice 
for  computer  vision  applications.  Three  general  lessons 
emerged.  First,  there  exists  useful  high-order  structure  in 
the  world  for  which  geometry  can  serve  as  a  proxy.  This 


was  illustrated  with  edge  statistics.  Second,  understanding 
constraints  between  problems  can  help  to  make  them 
better  posed.  This  was  illustrated  by  the  color-shading 
interaction.  Third,  the  shading  analysis  suggests  that 
perhaps  one  should  not  seek  a  full,  global  solution  to  a 
problem,  especially  when  it  is  very  ill-posed.  Rather,  there 
may  be  islands  of  (almost)  well-posed  subproblems  within 
them  that  can  serve  as  anchors  for  a  more  general,  overall 
solution.  Nailing  3-D  structure  around  boundaries  and 
ridges  could  be  a  case  in  point.  Although  our  percepts  seem 
globally  veridical,  in  fact  much  of  what  we  perceive  is  an 
hallucination.  Perhaps  this  is  all  that  our  computational 
vision  algorithms  should  be  asked  to  accomplish.  ■ 
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